Mahdi Vasighi

Assistant Professor at Department of Computer Science and Information Technology

Institute for Advanced Studies in Basic Sciences (IASBS), No. 444, Prof. Yousef Sobouti Blvd., Zanjan 45137-66731, Iran


Contact information

vasighi iasbs.ac.ir

vasighi gmail.com

+98 24 3315 3378

Institute for Advanced Studies in Basic Sciences (IASBS), No. 444, Prof. Yousef Sobouti Blvd., Zanjan 45137-66731, Iran



The International Conference on Contemporary Issues In Data Science

Conference homepage

March 5-8, 2019

Institute for Advanced Studies in Basic Sciences (IASBS)

Mahdi Vasighi

Datasets

VK4752

This data set was prepared based on the Class, Architecture, Topology and Homology (CATH) database for structural classification of proteins. The sequences from the Protein Data Bank (PDB) were incorporated to reconstruct entire chains. The curated dataset includes only single chain monomeric proteins with maximum 40% sequence identity. Proteins that were not solved by X-ray diffraction methods and were less than 30 amino acids in length were filtered out and the final curated dataset contained 4752 unique sequences in four classes: mainly-α, mainly-β, mixed α+β and few secondary structure (fss) class.
Relevant Paper : Amino Acids 49 (2017) 261-271.

Download Dataset